NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Wasm-R3: Record-Reduce-Replay for Realistic and Standalone WebAssembly Benchmarks

https://doi.org/10.1145/3689787

Baek, Doehyun; Getz, Jakob; Sim, Yusung; Lehmann, Daniel; Titzer, Ben L; Ryu, Sukyoung; Pradel, Michael (October 2024, Proceedings of the ACM on Programming Languages)
Hicks, Michael (Ed.)
WebAssembly (Wasm for short) brings a new, powerful capability to the web as well as Edge, IoT, and embedded systems. Wasm is a portable, compact binary code format with high performance and robust sandboxing properties. As Wasm applications grow in size and importance, the complex performance characteristics of diverse Wasm engines demand robust, representative benchmarks for proper tuning. Stopgap benchmark suites, such as PolyBenchC and libsodium, continue to be used in the literature, though they are known to be unrepresentative. Porting of more complex suites remains difficult because Wasm lacks many system APIs and extracting real-world Wasm benchmarks from the web is difficult due to complex host interactions. To address this challenge, we introduce Wasm-R3, the first record and replay technique for Wasm. Wasm-R3 transparently injects instrumentation into Wasm modules to record an execution trace from inside the module, then reduces the execution trace via several optimizations, and finally produces a replay module that is executable standalone without any host environment-on any engine. The benchmarks created by our approach are (i) realistic, because the approach records real-world web applications, (ii) faithful to the original execution, because the replay benchmark includes the unmodified original code, only adding emulation of host interactions, and (iii) standalone, because the replay benchmarks run on any engine. Applying Wasm-R3 to web-based Wasm applications in the wild demonstrates the correctness of our approach as well as the effectiveness of our optimizations, which reduce the recorded traces by 99.53% and the size of the replay benchmark by 9.98%. We release the resulting benchmark suite of 27 applications, called Wasm-R3-Bench, to the community, to inspire a new generation of realistic and standalone Wasm benchmarks.
more » « less
Full Text Available
Fuzz4All: Universal Fuzzing with Large Language Models

https://doi.org/10.1145/3597503.3639121

Xia, Chunqiu Steven; Paltenghi, Matteo; Le Tian, Jia; Pradel, Michael; Zhang, Lingming (April 2024, ACM)

Full Text Available
That’s a Tough Call: Studying the Challenges of Call Graph Construction for WebAssembly

https://doi.org/10.1145/3597926.3598104

Lehmann, Daniel; Thalakottur, Michelle; Tip, Frank; Pradel, Michael (July 2023, ACM)

Full Text Available
VulGen: Realistic Vulnerable Sample Generation via Pattern Mining and Deep Learning

https://doi.org/10.1109/ICSE48619.2023.00211

Nong, Yu; Ou, Yuzhe; Pradel, Michael; Chen, Feng; Cai, Haipeng (May 2023, Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE 2023))

Building new, powerful data-driven defenses against prevalent software vulnerabilities needs sizable, quality vulnerability datasets, so does large-scale benchmarking of existing defense solutions. Automatic data generation would promisingly meet the need, yet there is little work aimed to generate much-needed quality vulnerable samples. Meanwhile, existing similar and adaptable techniques suffer critical limitations for that purpose. In this paper, we present VULGEN, the first injection-based vulnerability-generation technique that is not limited to a particular class of vulnerabilities. VULGEN combines the strengths of deterministic (pattern-based) and probabilistic (deep-learning/DL-based) program transformation approaches while mutually overcoming respective weaknesses. This is achieved through close collaborations between pattern mining/application and DL-based injection localization, which separates the concerns with how and where to inject. By leveraging large, pretrained programming language modeling and only learning locations, VULGEN mitigates its own needs for quality vulnerability data (for training the localization model). Extensive evaluations show that VULGEN significantly outperforms a state-of-the-art (SOTA) pattern-based peer technique as well as both Transformer- and GNN-based approaches in terms of the percentages of generated samples that are vulnerable and those also exactly matching the ground truth (by 38.0--430.1% and 16.3--158.2%, respectively). The VULGEN-generated samples led to substantial performance improvements for two SOTA DL-based vulnerability detectors (by up to 31.8% higher in F1), close to those brought by the ground-truth real-world samples and much higher than those by the same numbers of existing synthetic samples.
more » « less
Full Text Available
Generating realistic vulnerabilities via neural code editing: an empirical study

https://doi.org/10.1145/3540250.3549128

Nong, Yu; Ou, Yuzhe; Pradel, Michael; Chen, Feng; Cai, Haipeng (November 2022, Proceedings of ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022))

Full Text Available
Nessie: automatically testing JavaScript APIs with asynchronous callbacks

https://doi.org/10.1145/3510003.3510106

Arteca, Ellen; Harner, Sebastian; Pradel, Michael; Tip, Frank (May 2022, Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022)

Full Text Available
Wobfuscator: Obfuscating JavaScript Malware via Opportunistic Translation to WebAssembly

https://doi.org/10.1109/SP46214.2022.9833626

Romano, Alan; Lehmann, Daniel; Pradel, Michael; Wang, Weihang (May 2022, In Proceedings of the 43rd IEEE Symposium on Security and Privacy.)

Full Text Available
ConfProf: White-Box Performance Profiling of Configuration Options

https://doi.org/10.1145/3427921.3450255

Han, Xue; Yu, Tingting; Pradel, Michael (April 2021, Proceedings of the ACM/SPEC International Conference on Performance Engineering)
null (Ed.)
Full Text Available
Automated program repair

https://doi.org/10.1145/3318162

Goues, Claire Le; Pradel, Michael; Roychoudhury, Abhik (November 2019, Communications of the ACM)

Full Text Available
Test generation for higher-order functions in dynamic languages

https://doi.org/10.1145/3276531

Selakovic, Marija; Pradel, Michael; Karim, Rezwana; Tip, Frank (October 2018, Proceedings of the ACM on Programming Languages)

Full Text Available

Search for: All records